--- layout: page title: Lab 6 - Diff In Diff permalink: /labs/lab6/ parent: Labs nav_order: 6 ---
The intuition of the DD strategy is to combine two simpler approaches. The first difference (before and after) eliminates unit-specific fixed effects. Then, the second difference eliminates time fixed effects. With this approach, we get an unbiased estimate of a (policy) intervention.
We learned that we can break down the difference between treated units and untreated units in post-treatment as the Average Treatment Effect Amongst the Treated (ATT), different time trends and selection bias. However, we can impose additional estimation assumptions to retrieve a credible estimate of the effect of treatment. The key assumption in diff-in-diff studies is the so-called Parallel Trends or Common Trends assumption. This assumption states that in the absence of the treatment/policy, we should expect to see that treated units would follow similar trends to the untreated ones. Unfortunately, we cannot test whether this assumption holds, but we can at least conduct some tests that would be indicated that the Parallel Trends holds.
We also learned that we can calculate the ATT in different ways. We learned that we can manually calculate the difference-in-difference estimator.
If you go back to the Golden Dawn, we can see how we retrieve the beta coefficient from the interaction.
\[gdper_{mt} = \beta_0 + \beta_1eventr_m + \beta_2post_t + \beta_3evertr_m \times post_t + u_{mt}\]
| Post = 0 | Post = 1 | |
|---|---|---|
| Treat = 0 | \(\beta_0 + u_{mt}\) | \(\beta_0 + \beta_2 + u_{mt}\) |
| Treat = 1 | \(\beta_0 +\beta_1 + u_mt\) | \(\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}\) |
Then, we can get our estimate by the calculating the difference of the outcome variable for both treated and untreated units and, and then subtract these differences:
\[((\beta_0 + \beta_1 + \beta_2 + \beta_3 + u_{mt}) - (\beta_0 + \beta_1 + u_mt)) -((\beta_0 + \beta_2 + u_{mt})-(\beta_0 + u_{mt}))\] \[(\beta_2 + \beta_3) -(\beta_2)\] \[\beta_3\] - Unit and time dummies and a treatment indicator: Again, in the case of the Golden Dawn example, we can represent this estimation strategy using the following regression model:
\[gdper_{mt} = \beta_1treatment_{mt} + \alpha_m \text{unit dummy} + \gamma_t \text{time dummy} + u_{mt}\] For treated before treated \(Treatment = 0\):
\[gdper_{mt} = \beta_1\times 0 + \alpha_m \times 1 + \gamma_t \times + u_{mt}\] \[gdper_{mt} = \alpha_m + \gamma_t + u_{mt}\] For treated after treated \(Treatment = 1\):
\[gdper_{mt} = \beta_1 + \alpha_m + \gamma_t + u_{mt}\] Then, we can take the difference before and after:
\[(\beta_1 + \alpha_m + \gamma_t + u_{mt}) - (\alpha_m + \gamma_t + u_{mt})\] \[\beta_1\] Again we are levering the parallel trends assumption by assuming that time-trends are the same for both treated and untreated units.
Finally, we discussed inference and in particular standard errors.
Given that we have repeated observations, this type of data exhibits
serially correlated regressors and residuals, we need to make the
appropriate adjustments to calculate standard errors. One way is to
address this is to use clustered standard errors.
Before starting this seminar
Create a folder called “lab6”
Download the data (you can use this button or the one at the top, or read csv files directly from github):
Open an R script (or Markdown file) and save it in our “lab6” folder.
Set your working directory using the setwd() function or by clicking on “More“. For example setwd(“~/Desktop/Causal Inference/2024/Lab6”)
Let’s install an load the packages that we will be using in this lab:
library(jtools) # generate plots with regression coefficients
library(stargazer) # generate formated regression tables
library(texreg) # generate formatted regression tables
library(tidyverse) # to conduct some tidy operations
library(ggplot2) # to generate plots
library(plm) # conduct one-way and two-way fixed effects
library(estimatr) # to conduct ols and provides robust standard errors
library(lmtest) # calculates the variance-covariance matrix to be clustered by group.
library(sandwich) # to calculate heteroscedasticity-Consistent Covariance Matrix Estimation
library(haven) # upload dta files
#install.packages("wesanderson")
library("wesanderson") # Wes Anderson palette (Let's make our plots as Indie as possible)
#install.packages("modelsummary")
library(modelsummary)
In this seminar, we will cover the following
topics:
1. “Manually” calculate the difference-in-difference estimator”
2. Obtaining the difference in difference estimator using the
lm() function
3. Check for parallel trends
4. Conduct a fixed effect in difference in difference estimation using
both lm_robust() and plm() function.
4. Conduct a placebo test
The main question of this paper is the following: Did the influx of refugees in Greece increase support for the right-wing Golden Dawn party in 2015?
The authors exploit that the Aegean islands close to the Turkish border experienced sudden and drastic increases in the number of Syrian refugees while other islands slightly farther away—but with otherwise similar institutional and socio-economic characteristics—did not.
We can see here on the Figure in the left that level of exposure to refugees across the Aegean islands. The figure on the right shows how sudden was the influx of refugees.
post variable and coded as year 2016Now let’s familiarise ourselves with the data. A description of some of the variables that we will use today is below:
| Variable | Description |
|---|---|
Year |
Election Year from 2012 to 2016 |
municipality |
Municipality id |
post |
A dummy variable that switches on year 2016 |
treatment |
A dummy variable indicating if the municipality received refugees on the 2016 election |
evertr |
Treatment variable indicating if the municipality received refugees |
gdper |
Vote share for the Golden Dawn at the municipality level as percentage of total votes |
Now let’s load the data. There are two ways to do this:
You can load the dataset from your laptop using the
read_dta() function. We will call this data frame
greekislands.
# Set your working directory
#setwd("~/Desktop/Causal Inference/2024/Lab6")
#
#setwd("/cloud/project")
Or you can download the data from the course website from following url: https://widening-sarajevo.github.io/CI/data/greekislands.dta.
Let’s start by checking our data as always.
Exercise 1: Use the head() function to
familiarise yourself with the data set.
head(greekislands)
# A tibble: 6 × 10
# Groups: muni [2]
year muni treatment distance gdper post gddif evertr post2 treat
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2012 1 0 240. 7.98 0 NA 0 0 0
2 2013 1 0 240. 7.28 0 -0.705 0 0 0
3 2015 1 0 240. 6.36 0 NA 0 1 0
4 2016 1 0 240. 7.62 1 1.25 0 1 0
5 2012 2 0 318. 2.58 0 NA 0 0 0
6 2013 2 0 318. 4.28 0 1.70 0 0 0
This is what we would expect. Recall that the treat
variable indicates municipalities that are treated at some point -
independent of the timing -, while the treatment variable
marks municipalities after they were treated.
As we can see, the data set covers multiple election years. Before
working with the data, let’s make sure we know how many and which
elections are included in the data.
Exercise 2: How many elections, i.e. years are
covered by the data?
unique(greekislands$year)
[1] 2012 2013 2015 2016
Using the unique command, we find that four elections are
covered by the data: The elections that took place in 2012, 2013, 2015,
and 2016.
Let’s have a look at general voting patterns of the Golden Dawn over time, irrespective of treatment status of municipalities.
Exercise 3: Plot the vote share of the Golden Dawn Party
(gdper) over time.
There are several ways to do this. For instance, you could plot the
dispersion across municipalities by using the boxplot
command or, alternatively, calculate average values per year. Feel free
to pick the option you deem most appropriate.
# First option: Boxplot
boxplot(gdper ~ year,
data = greekislands,
xlab = "Election Year",
ylab = "Golden Dawn Vote Share",
las = 2) # This argument rotates the axis labels
# Second Option: Plot averages
# Calculating and storing means
average_data <- greekislands %>%
group_by(year) %>%
summarize(gd_averages = mean(gdper))
# Plotting means using ggplot
ggplot(average_data, aes(x = year, y = gd_averages)) +
geom_point() + geom_line() + xlab("Election Year") + ylab("Average GD Vote Share") + ylim(0,6.5)
We can se that the vote share for the Golden Dawn party has remained somewhat stable between the 2012 and 2013 elections, and dropped in the 2015 election to an average value (per municipality) of 4.46%. In the 2016 election, the party’s vote share rose substantively - to a new high of 6%.
Being aware of the general trends of the Golden Dawn’s vote share is
an important information about context and the party’s history. However,
it cannot tell us anything about the treatment effect we seek to
analyse: The arrival of refugees in some Greek municipalities in the
summer of 2015.
A naive observer might propose identifying this effect by looking at the differences between treated and untreated units in the post-treatment periods. Would this, however, be an appropriate representation of a possible treatment effect? It clearly would not! Comparing post-treatment differences only doesn’t allows us to account for unit/municipality-specific effects and voting patterns. Treatment, after all, was not assigned randomly. We would not be able to say what the effect of the treatment is unless we can make a statement about how the treatment changed the outcome or resulted in the diversion from a previous trajectory. Using a differences-in-differences design allows us to do so.
Exercise 4: Estimate the treatment effect by calculating
differences-in-differences between 2015 and 2016 using the
mean() function.
Hint: Calculate the differences between treated and untreated units for the years 2015 and 2016 first.
# Difference in means between treated and untreated in 2016 (pot-treatment).
post_difference <- mean(greekislands$gdper[greekislands$evertr == 1 & greekislands$year == 2016]) - mean(greekislands$gdper[greekislands$evertr == 0 & greekislands$year == 2016])
post_difference
[1] 2.744838
# Difference in means between treated and untreated in 2015 (pre-treatment).
pre_difference <- mean(greekislands$gdper[greekislands$evertr == 1 & greekislands$year == 2015]) - mean(greekislands$gdper[greekislands$evertr == 0 & greekislands$year == 2015])
pre_difference
[1] 0.6212737
# Now calculate the difference between the two differences above
diff_in_diff <- post_difference - pre_difference
diff_in_diff
[1] 2.123564
The difference in the Golden Dawn’s vote share between treated and untreated municipalities has increased in 2016. The differences-in-differences amount to
2.12. This suggests
that the treatment (i.e. the arrival of refugees) increased the vote
share of the golden dawn by roughly 2%-points in the affected
municipalities.
While it is important to understand what exactly the
difference-in-difference means, we usually do not have to calculate it
manually. In fact, we can simply use an OLS to estimate
differences-in-differences.
Exercise 5: Estimate the difference-in-difference between 2015 and 2016 using an OLS regression.
data = greekislands[greekislands$year>=2015,]
as argument for your OLS.
# Creating a post-treatment period (i.e. 2016) dummy
greekislands$post_treatment <- greekislands$year == 2016
ols_did <- lm_robust(gdper ~ evertr * post_treatment, data = greekislands[greekislands$year>=2015,])
summary(ols_did)
Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >=
2015, ])
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower
(Intercept) 4.3854 0.2221 19.7409 1.045e-47 3.9471
evertr 0.6213 0.7996 0.7769 4.382e-01 -0.9561
post_treatmentTRUE 1.2737 0.3190 3.9935 9.333e-05 0.6445
evertr:post_treatmentTRUE 2.1236 1.3052 1.6270 1.054e-01 -0.4511
CI Upper DF
(Intercept) 4.824 188
evertr 2.199 188
post_treatmentTRUE 1.903 188
evertr:post_treatmentTRUE 4.698 188
Multiple R-squared: 0.1762 , Adjusted R-squared: 0.163
F-statistic: 9.036 on 3 and 188 DF, p-value: 1.279e-05
library(broom)
print(summary(ols_did),digits=max(3, getOption("digits") - 3))
Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >=
2015, ])
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower
(Intercept) 4.3854 0.2221 19.7409 1.045e-47 3.9471
evertr 0.6213 0.7996 0.7769 4.382e-01 -0.9561
post_treatmentTRUE 1.2737 0.3190 3.9935 9.333e-05 0.6445
evertr:post_treatmentTRUE 2.1236 1.3052 1.6270 1.054e-01 -0.4511
CI Upper DF
(Intercept) 4.824 188
evertr 2.199 188
post_treatmentTRUE 1.903 188
evertr:post_treatmentTRUE 4.698 188
Multiple R-squared: 0.1762 , Adjusted R-squared: 0.163
F-statistic: 9.036 on 3 and 188 DF, p-value: 1.279e-05
The estimate for the interaction term, i.e. treated units after they
were treated, corresponds to the difference-in-differences. As you can
see, the OLS provides exactly the same point estimate as the manual
calculation of the difference-in-differences, 2.12 -
however, the OLS also provides measures of uncertainty, showing us that
the estimate is not significant to conventional levels. The OLS provides
quite intuitive estimates: The intercept corresponds to the Golden
Dawn’s vote share in in untreated municipalities in 2015. Treated
municipalities had, on average, a higher vote share for the party by
roughly 0.62%points. The parties vote share increased by
about 1.27%-points in untreated municipalities in 2016.
Note that there are multiple ways to calculate the
difference-in-differences. Fixed effects are usually the preferred
approach as they provide the most flexible (e.g. in terms of multiple
time periods) and most efficient estimator. Let’s therefore calculate a
fixed effects model for the same difference-in-differences as in
Exercise 5.
Exercise 6: Estimate the difference-in-differences between
2015 and 2016 using a Fixed Effects regression.
Hint: You can use either lm_robust(),
plm() or lm() with dummy variables.
# lm_robust
fe_robust <- lm_robust(gdper ~ evertr*post_treatment, fixed_effects= factor(muni), data=greekislands[greekislands$year>=2015,])
summary(fe_robust)
Call:
lm_robust(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >=
2015, ], fixed_effects = factor(muni))
Standard error type: HC2
Coefficients: (1 not defined because the design matrix is rank deficient)
Estimate Std. Error t value Pr(>|t|) CI Lower
evertr NA NA NA NA NA
post_treatmentTRUE 1.274 0.09139 13.938 1.333e-24 1.092
evertr:post_treatmentTRUE 2.124 0.39686 5.351 6.152e-07 1.336
CI Upper DF
evertr NA NA
post_treatmentTRUE 1.455 94
evertr:post_treatmentTRUE 2.912 94
Multiple R-squared: 0.9651 , Adjusted R-squared: 0.9292
Multiple R-squared (proj. model): 0.7791 , Adjusted R-squared (proj. model): 0.5511
F-statistic (proj. model): NA on 2 and 94 DF, p-value: NA
# plm
fe_plm <- plm(gdper ~ evertr*post_treatment, model = "within", index = c("muni"), data=greekislands[greekislands$year>=2015,])
summary(fe_plm)
Oneway (individual) effect Within Model
Call:
plm(formula = gdper ~ evertr * post_treatment, data = greekislands[greekislands$year >=
2015, ], model = "within", index = c("muni"))
Balanced Panel: n = 96, T = 2, N = 192
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-1.28954 -0.27782 0.00000 0.27782 1.28954
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
post_treatmentTRUE 1.273734 0.099336 12.8225 < 2.2e-16 ***
evertr:post_treatmentTRUE 2.123564 0.280964 7.5581 2.67e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 176.35
Residual Sum of Squares: 38.957
R-Squared: 0.77909
Adj. R-Squared: 0.55113
F-statistic: 165.754 on 2 and 94 DF, p-value: < 2.22e-16
fe_dummy <- lm(gdper ~ evertr * post_treatment + factor(muni), data = greekislands[greekislands$year>=2015,])
summary(fe_dummy)
Call:
lm(formula = gdper ~ evertr * post_treatment + factor(muni),
data = greekislands[greekislands$year >= 2015, ])
Residuals:
Min 1Q Median 3Q Max
-1.2895 -0.2778 0.0000 0.2778 1.2895
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.35368 0.45792 13.875 < 2e-16 ***
evertr -2.69291 0.65892 -4.087 9.20e-05 ***
post_treatmentTRUE 1.27373 0.09934 12.822 < 2e-16 ***
factor(muni)2 -3.78604 0.64377 -5.881 6.20e-08 ***
factor(muni)3 -1.55751 0.64377 -2.419 0.017472 *
factor(muni)4 -3.10872 0.64377 -4.829 5.31e-06 ***
factor(muni)5 -1.22008 0.64377 -1.895 0.061138 .
factor(muni)6 1.06490 0.64377 1.654 0.101431
factor(muni)7 -1.13881 0.64377 -1.769 0.080143 .
factor(muni)8 -4.10152 0.64377 -6.371 6.88e-09 ***
factor(muni)9 -1.05344 0.64377 -1.636 0.105109
factor(muni)10 -2.66002 0.64377 -4.132 7.80e-05 ***
factor(muni)11 0.54228 0.64377 0.842 0.401730
factor(muni)12 -2.99074 0.64377 -4.646 1.10e-05 ***
factor(muni)13 -6.99054 0.64377 -10.859 < 2e-16 ***
factor(muni)14 -2.18372 0.64377 -3.392 0.001016 **
factor(muni)15 -4.63345 0.64377 -7.197 1.48e-10 ***
factor(muni)16 -4.98236 0.64377 -7.739 1.12e-11 ***
factor(muni)17 -5.23368 0.64377 -8.130 1.70e-12 ***
factor(muni)18 -3.42812 0.64377 -5.325 6.86e-07 ***
factor(muni)19 -3.48572 0.64377 -5.415 4.69e-07 ***
factor(muni)20 0.73748 0.64377 1.146 0.254884
factor(muni)21 -0.23429 0.64377 -0.364 0.716720
factor(muni)22 2.13230 0.64377 3.312 0.001315 **
factor(muni)23 -1.23974 0.64377 -1.926 0.057158 .
factor(muni)24 -3.75036 0.64377 -5.826 7.92e-08 ***
factor(muni)25 -0.93581 0.64377 -1.454 0.149375
factor(muni)26 0.44822 0.64377 0.696 0.487996
factor(muni)27 -3.71715 0.64377 -5.774 9.93e-08 ***
factor(muni)28 -2.61289 0.64377 -4.059 0.000102 ***
factor(muni)29 -2.46477 0.64377 -3.829 0.000232 ***
factor(muni)30 -4.21976 0.64377 -6.555 2.97e-09 ***
factor(muni)31 0.26073 0.64377 0.405 0.686390
factor(muni)32 -3.01851 0.64377 -4.689 9.29e-06 ***
factor(muni)33 -2.79332 0.64377 -4.339 3.60e-05 ***
factor(muni)34 -0.28907 0.64377 -0.449 0.654451
factor(muni)35 -0.43482 0.64377 -0.675 0.501058
factor(muni)36 -2.67040 0.64377 -4.148 7.35e-05 ***
factor(muni)37 -0.17064 0.64377 -0.265 0.791537
factor(muni)38 2.30562 0.64377 3.581 0.000543 ***
factor(muni)39 -0.01353 0.64377 -0.021 0.983281
factor(muni)40 -3.86558 0.64377 -6.005 3.59e-08 ***
factor(muni)41 -1.47426 0.64377 -2.290 0.024255 *
factor(muni)42 -1.19513 0.64377 -1.856 0.066522 .
factor(muni)43 2.65292 0.64377 4.121 8.12e-05 ***
factor(muni)44 -2.72179 0.64377 -4.228 5.47e-05 ***
factor(muni)45 -1.81245 0.64377 -2.815 0.005936 **
factor(muni)46 1.70164 0.64377 2.643 0.009621 **
factor(muni)47 1.11290 0.64377 1.729 0.087143 .
factor(muni)48 -1.44698 0.64377 -2.248 0.026936 *
factor(muni)49 -0.92296 0.64377 -1.434 0.154981
factor(muni)50 -1.76604 0.64377 -2.743 0.007285 **
factor(muni)51 -3.26407 0.64377 -5.070 1.99e-06 ***
factor(muni)52 -3.88610 0.64377 -6.036 3.11e-08 ***
factor(muni)53 -1.08901 0.64377 -1.692 0.094034 .
factor(muni)54 9.36023 0.64377 14.540 < 2e-16 ***
factor(muni)55 -4.05763 0.64377 -6.303 9.38e-09 ***
factor(muni)56 -3.62824 0.64377 -5.636 1.81e-07 ***
factor(muni)57 -0.74727 0.64377 -1.161 0.248673
factor(muni)58 -4.08326 0.64377 -6.343 7.83e-09 ***
factor(muni)59 -1.07415 0.64377 -1.669 0.098538 .
factor(muni)60 -3.56730 0.64377 -5.541 2.73e-07 ***
factor(muni)61 -3.65170 0.64377 -5.672 1.55e-07 ***
factor(muni)62 -4.22268 0.64377 -6.559 2.91e-09 ***
factor(muni)63 -0.84999 0.64377 -1.320 0.189933
factor(muni)64 -2.15650 0.64377 -3.350 0.001165 **
factor(muni)65 -1.77899 0.64377 -2.763 0.006883 **
factor(muni)66 -0.99105 0.64377 -1.539 0.127055
factor(muni)67 0.55694 0.64377 0.865 0.389177
factor(muni)68 -2.68379 0.64377 -4.169 6.81e-05 ***
factor(muni)69 -0.13410 0.64377 -0.208 0.835441
factor(muni)70 2.82223 0.64377 4.384 3.04e-05 ***
factor(muni)71 -2.23698 0.64377 -3.475 0.000775 ***
factor(muni)72 4.37066 0.64377 6.789 1.01e-09 ***
factor(muni)73 -0.05003 0.64377 -0.078 0.938224
factor(muni)74 -2.14293 0.64377 -3.329 0.001247 **
factor(muni)75 -3.96392 0.64377 -6.157 1.81e-08 ***
factor(muni)76 -4.56502 0.64377 -7.091 2.45e-10 ***
factor(muni)77 2.60295 0.64377 4.043 0.000108 ***
factor(muni)78 -2.79773 0.64377 -4.346 3.51e-05 ***
factor(muni)79 0.45699 0.64377 0.710 0.479545
factor(muni)80 -1.29930 0.64377 -2.018 0.046413 *
factor(muni)81 -1.49319 0.64377 -2.319 0.022537 *
factor(muni)82 3.19897 0.64377 4.969 3.01e-06 ***
factor(muni)83 -1.54715 0.64377 -2.403 0.018213 *
factor(muni)84 -0.44015 0.64377 -0.684 0.495840
factor(muni)85 -4.15859 0.64377 -6.460 4.59e-09 ***
factor(muni)86 0.32742 0.64377 0.509 0.612231
factor(muni)87 -2.77789 0.64377 -4.315 3.94e-05 ***
factor(muni)88 -3.95815 0.64377 -6.148 1.89e-08 ***
factor(muni)89 -5.22449 0.64377 -8.115 1.83e-12 ***
factor(muni)90 -1.68049 0.64377 -2.610 0.010525 *
factor(muni)91 NA NA NA NA
factor(muni)92 1.09935 0.64377 1.708 0.090996 .
factor(muni)93 -1.90673 0.64377 -2.962 0.003873 **
factor(muni)94 -2.30515 0.64377 -3.581 0.000545 ***
factor(muni)95 -3.84034 0.64377 -5.965 4.27e-08 ***
factor(muni)96 -2.36403 0.64377 -3.672 0.000399 ***
evertr:post_treatmentTRUE 2.12356 0.28096 7.558 2.67e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6438 on 94 degrees of freedom
Multiple R-squared: 0.9651, Adjusted R-squared: 0.9292
F-statistic: 26.83 on 97 and 94 DF, p-value: < 2.2e-16
plm models render statistically significant
results.
Let’s now extend our analysis by including all pre-treatment periods
in our analysis. The easiest way to do so is running a two-way fixed
effects regression.
Exercise 7: Estimate the difference-in-differences using a
two-way Fixed Effects regression with all time periods and
treatment as independent variable.
twoway_FE <- plm(gdper ~ treatment, effect="twoways",model="within",index=c("muni","year"), data=greekislands)
summary(twoway_FE)
Twoways effects Within Model
Call:
plm(formula = gdper ~ treatment, data = greekislands, effect = "twoways",
model = "within", index = c("muni", "year"))
Balanced Panel: n = 96, T = 4, N = 384
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-4.5964070 -0.5205535 -0.0089621 0.4459476 7.0013287
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
treatment 2.08700 0.39328 5.3066 2.253e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 380.22
Residual Sum of Squares: 345.92
R-Squared: 0.090211
Adj. R-Squared: -0.22693
F-statistic: 28.1603 on 1 and 284 DF, p-value: 2.2526e-07
Exercise 8: Calculate robust standard errors for (i) the plm
FE model, (ii) the two-way FE model and present the regression output in
a single table. Include the simple OLS model in the
table.
Note: There is no need to adjust standard errors after using
lm_robust() as the command automatically does
that.
### ADJUSTING STANDARD ERRORS
## plm FE model
fe_plm_se <- coeftest(fe_plm, vcov = vcovHC(fe_plm, type = "HC1",
cluster = "group"))
fe_plm_se
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
post_treatmentTRUE 1.273734 0.091318 13.9483 < 2.2e-16 ***
evertr:post_treatmentTRUE 2.123564 0.382751 5.5482 2.649e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## two-way FE model
twoway_FE_se <- coeftest(twoway_FE, vcov = vcovHC(twoway_FE, type = "HC1",
cluster = c("group", "time")))
twoway_FE_se
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
treatment 2.08700 0.34699 6.0147 5.535e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## CREATING A REGRESSION TABLE
screenreg(l=list(ols_did, fe_plm_se, twoway_FE_se), custom.header = list("OLS (2-period)" = 1, "FE (2-period)" = 2, "2-way FE" = 3), include.ci=c(FALSE, FALSE, FALSE), custom.coef.map = list("treatment" = "Treatment", "evertr:post_treatmentTRUE" = "Treatment (Interaction)"))
======================================================
OLS (2- FE (2-per 2-way FE
------- --------- ---------
Model 1 Model 2 Model 3
------------------------------------------------------
Treatment 2.09 ***
(0.35)
Treatment (Interaction) 2.12 2.12 ***
(1.31) (0.38)
------------------------------------------------------
R^2 0.18
Adj. R^2 0.16
Num. obs. 192
RMSE 2.21
======================================================
*** p < 0.001; ** p < 0.01; * p < 0.05
As we know, we cannot test whether the parallel trends, but we can conduct some visual inspects and statistical analyses. One way to do this is to plot the outcome variable for both treated and untreated units overtime before the intervention.
Let’s first get the mean vote share for the Golden Dawn for the
treated and untreated units. Let’s do using Tidyverse using the
group_by() and summarise() functions.
Exercise 9: Calculate the mean vote share for the Golden Dawn
for the treated and the untreated units for all the elections. Store
this into a new data frame and called plot.parallel. Use
tidy to create this new data frame.The syntax and the description of the
functions below:
new.data <- data %>%
group_by(variable_1, variable_2) %>%
summarise(new_variable = mean(outcome, na,rm = TRUE)) %>%
mutate(condition = ifelse(evertr == 1, "Treatment", "Control") )
| Function | Description |
|---|---|
new data <- dataframe |
Assignment operator where the new data frame will be stored |
group_by() |
Group observations by a variable or set of variables |
summarise() |
Creates a new data frame with one ore more rows of each combination of grouping variables |
mutate() |
Allows to create new variable or modify existing ones |
plot.parallel <- greekislands %>%
group_by(evertr, year) %>%
summarise(vote = mean(gdper)) %>%
mutate(condition = ifelse(evertr == 1, "Treatment", "Control"))
head(plot.parallel)
# A tibble: 6 × 4
# Groups: evertr [2]
evertr year vote condition
<dbl> <dbl> <dbl> <chr>
1 0 2012 5.49 Control
2 0 2013 5.34 Control
3 0 2015 4.39 Control
4 0 2016 5.66 Control
5 1 2012 6.16 Treatment
6 1 2013 6.02 Treatment
We can see now that we have the average vote share for treated and
untreated for every election year. For example, we can see that the
average vote share for the Golden Dawn in untreated islands was
5.33 for the 2013 election.
Right, so we see that we have the average vote share for treated and untreated units for each election from 2012 to 2016. Now, let’s plot the trends before and after the intervention.
Exercise 10: Plot the parallel trends. Set vote share for the Golden Dawn in the y axis, year in the x-axis. Connect the data points of the two groups using a line. Place the legend of your plot at the bottom. Change the default colour and use the Wes Anderson Palette. Below you will find all the functions necessary to generate this plot. Remember to use the plus sign between functions.
| Function | Description |
|---|---|
ggplot(x = , y =, colour = ) |
map data components into the graph |
geom_point() |
To generate a scatterplot |
scale_color_manual(values=wes_palette("Royal1")) |
Replace the default palette |
theme(legend.position = "bottom") |
To place the legend at the bottom |
geom_line(aes(x=year,y=vote, color = condition, group = condition)) |
To connect the dots with lines by group |
ggplot(aes(x = year, y = vote, colour = condition), data = plot.parallel) +
geom_point() + scale_color_manual(values=wes_palette("Royal1")) + theme(legend.position = "bottom") + geom_line(aes(x=year,y=vote, color = condition, group = condition))
The plot suggests that parallel trends assumption would hold in this case. The vote share for the Golden Dawn follows a similar path for treated and untreated units for at least three elections before the intervention.
Now let’s look at the leads to identify any anticipatory effects. Let’s imagine that the Golden Dawn back in 2012 believed that there was going to be a major humanitarian in the future. Then, they thought that they could exploit this situation to increase their electoral gains. In that case, we wouldn’t be able to disentangle whether changes in vote share are due to the previous campaigning efforts on the part of the Golden Dawn or due to the influx of alyssum seekers to Greece. We can use leads to identify if there are any anticipatory effects. If we find systematic differences between treated and untreated units, this would suggest that units in one or both groups are responding to the treatment before receiving it.
Exercise 11: Create dummy year variables equal to 1 for every
year and only for the treated municipality. Call this variable leads,
plus the year. For example, the lead2012, will take value 1 only for
treated municipalities and only for observations of these municipalities
in the year 2012. You can see an example below. use the
mutate() function to create these new variables. You can
also use the ifelse() function to create these dummy
variables. The syntax of the ifelse() function is the
following:
new variable = ifelse(condition, "value if condition is met", "value if the condition is not met").
Create these dummy variables for the elections in 2012, 2013, and
2015.
greekislands <- greekislands %>%
mutate(lead2012 = ifelse(evertr == 1 & year == 2012, 1, 0))
table(greekislands$year)
2012 2013 2015 2016
96 96 96 96
greekislands <- greekislands %>%
mutate(lead2012 = ifelse(evertr == 1 & year == 2012, 1, 0),
lead2013 = ifelse(evertr == 1 & year == 2013, 1, 0),
lead2015 = ifelse(evertr == 1 & year == 2015, 1, 0))
Now that we have created these dummy variables, we can run a two-way fixed effect model and see if they are anticipatory effects.
Exercise 12: Conduct the same two-way fixed-effect model that
we used before, but rather than using the treatment
variable, replace this variable with the new leads variables that you
created. Ran separate estimations for each lead. Store the outputs of
these regressions into different objects. Does the evidence suggest that
there are any anticipatory effects? Are the results of these three
models statistically significant? You can use the summary() or
screenreg() functions to take a look at your results.
lead2012 <- plm(gdper ~ lead2012, model = "within", effect = "twoways",
index = c("muni", "year"), data = greekislands)
lead2013 <- plm(gdper ~ lead2013, model = "within", effect = "twoways",
index = c("muni", "year"), data = greekislands)
lead2015 <- plm(gdper ~ lead2015, model = "within", effect = "twoways",
index = c("muni", "year"), data = greekislands)
summary(lead2015) # as an example.
Twoways effects Within Model
Call:
plm(formula = gdper ~ lead2015, data = greekislands, effect = "twoways",
model = "within", index = c("muni", "year"))
Balanced Panel: n = 96, T = 4, N = 384
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-4.600977 -0.530094 0.017942 0.464218 7.089811
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
lead2015 -0.74442 0.40995 -1.8159 0.07044 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 380.22
Residual Sum of Squares: 375.86
R-Squared: 0.011477
Adj. R-Squared: -0.33311
F-statistic: 3.29745 on 1 and 284 DF, p-value: 0.070442
The evidence gathered from the three models suggests there are no anticipatory effects. We don’t find any statistically significant dynamics between treated and untreated municipalities on vote share for the Golden Dawn before the 2016 election, which is good!
Now, let’s plot all the two-way fixed effects models where we used
the plm() function into a single figure.
Exercise 13: Plot the coefficients from the leads models,
plus the two-way fixed model for the 2016 election that you used in
Question 7 (“twoway_FE”). Use the plot_coef() function to
generate this plot. Add the argument scale and set it equal
to TRUE. Also, include the argument robust and
set it equal to TRUE. In addition to the
plot_coefs() include the coord_flip()
function. This function will flip the Cartesian coordinates of the plot,
so we have the models (years) in the x-axis and the coefficients in the
y axis. Remember to add plus sign operator between two functions. You
can also add the xlab("Year") function to a label in the
x-axis.
plot_coefs(lead2012, lead2013, lead2015, twoway_FE, scale = TRUE, robust = TRUE) +
coord_flip() + ylab("Year")
Again, we can observe that pre-treatment coefficients are nearly zero
and they are not statistically significant, but then for the 2015
election, we find that the vote share for Golden Dawn has risen
substantially. Thus, we found strong evidence that the increase in vote
share is caused by the influx of refugees into the Greek islands.
We can conduct a placebo test to evaluate whether the parallel trend holds. We are trying to prove that there is no clear difference in trending tendencies between treated and untreated municipalities.
The steps to conduct are the following:
plm() or
lm_robust() function.Exercise 14: Drop all the observations of the year of the
intervention (2016). Do this using the filter() function.
Then, create a fake treatment variable and call it post2
and set it equal to 1 for all observations in year 2015. This variable
would indicate as the hypothetical case that the municipality would
received refugees in 2015. You can see an example below.
greekislands <- greekislands %>%
filter(variable != 2016) %>%
mutate(new variable = year variable == "year")
## or alternatively
greekislands <- greekislands %>%
filter(variable != 2016) %>%
mutate(new.variable = ifelse(year variable == "year", 1, 0))
table(greekislands$year)
2012 2013 2015 2016
96 96 96 96
greekislands <- greekislands %>%
filter(year != 2016) %>%
mutate(post2 = ifelse(year == "2015", 1, 0))
Great! Now let’s use this dummy variable to perform the placebo test. We are essentially going to test is that there are no differential trends between treated and untreated, as we did before when we looked at anticipatory trends.
Exercise 15: Conduct the same two-way fixed effect model
using the lm() and use the post2 variable. Store all the
models in a list and plug this list inside of the
modelsummary(list) function to report your results. Did you
find statistically significant differences between treated and untreated
units pre-treatment?. You can see an example of how to store multiple
models in a list list(). Also, subset the data so one model
will only conduct the placebo test using the 2012 and 2015 elections,
and another model only using the observations from the 2013 and 2015
elections.
models <- list(
"plm "= lm(outcome ~ treatment, model = "within", effect = "twoways", index = c("muni", "year"), data = data[data$year != year,]),
"lm" = lm(outcome ~ outcome, data = data, subset=(year!=year)), # set year equal to year that you want to exclude.
"lm_robust" = lm_robust(outcome ~ treatment, data = data[data$year != year,], cluster = unit)) # set year equal to year that you want to exclude.
models <- list(
"1215"=lm_robust(gdper ~ evertr + post2+evertr*post2, data = greekislands, subset=(year!=2013), cluster=muni), # placebo test 2012 and 2015
"1314" = lm_robust(gdper ~ evertr + post2+evertr*post2, data = greekislands, subset=(year!=2012),cluster=muni)) # placeblo test 2013 and 2014
modelsummary(models)
| 1215 | 1314 | |
|---|---|---|
| (Intercept) | 5.491 | 5.339 |
| (0.334) | (0.261) | |
| evertr | 0.668 | 0.684 |
| (1.127) | (0.651) | |
| post2 | −1.106 | −0.954 |
| (0.227) | (0.126) | |
| evertr × post2 | −0.047 | −0.063 |
| (0.601) | (0.403) | |
| Num.Obs. | 192 | 192 |
| R2 | 0.048 | 0.054 |
| R2 Adj. | 0.033 | 0.038 |
| Std.Errors | by: muni | by: muni |
We find that there are no systematic differences between treated and untreated municipalities. We find near to zero coefficients.